Infection, Genetics and Evolution 32 (2015) 416-424 



ELSEVIER 


Contents lists available at ScienceDirect 

Infection, Genetics and Evolution 

journal homepage: www.elsevier.com/locate/meegid 



Genomic and single nucleotide polymorphism analysis of infectious 
bronchitis coronavirus 

Celia Abolnik 

Poultry Section, Department of Production Animal Studies, Faculty of Veterinary Science, University of Pretoria, Onderstepoort 0110, South Africa 



CrossMarlc 


ARTICLE INFO 


ABSTRACT 


Article history: 

Received 5 December 2014 
Received in revised form 13 March 2015 
Accepted 26 March 2015 
Available online 3 April 2015 


Keywords: 

Infectious bronchitis 
Coronavirus 


SNP 

Spike 

HVR 


Infectious bronchitis virus (IBV) is a Gammacoronavirus that causes a highly contagious respiratory dis¬ 
ease in chickens. A QX-like strain was analysed by high-throughput Illumina sequencing and genetic 
variation across the entire viral genome was explored at the sub-consensus level by single nucleotide 
polymorphism (SNP) analysis. Thirteen open reading frames (ORFs) in the order 5'-UTR-la-lab-S-3a- 
3b-E-M-4b-4c-5a-5b-N-6b-3 / UTR were predicted. The relative frequencies of missense: silent SNPs were 
calculated to obtain a comparative measure of variability in specific genes. The most variable ORFs in 
descending order were E, 3b, 5'UTR, N, la, S, lab, M, 4c, 5a, 6b. The E and 3b protein products play 
key roles in coronavirus virulence, and RNA folding demonstrated that the mutations in the 5'UTR did 
not alter the predicted secondary structure. The frequency of SNPs in the Spike (S) protein ORF of 
0.67% was below the genomic average of 0.76%. Only three SNPS were identified in the SI subunit, none 
of which were located in hypervariable region (HVR) 1 or HVR2. The S2 subunit was considerably more 
variable containing 87% of the polymorphisms detected across the entire S protein. The S2 subunit also 
contained a previously unreported multi-A insertion site and a stretch of four consecutive mutated amino 
acids, which mapped to the stalk region of the spike protein. Template-based protein structure modelling 
produced the first theoretical model of the IBV spike monomer. Given the lack of diversity observed at the 
sub-consensus level, the tenet that the HVRs in the SI subunit are very tolerant of amino acid changes 
produced by genetic drift is questioned. 

© 2015 Elsevier B.V. All rights reserved. 


1. Introduction 

Coronaviruses (family Coronaviridae, order Nidovirales) are 
enveloped, single-stranded RNA viruses with large genome sizes 
of ~25-30 kb. The family is split into four genera: Alpha-, Beta, 
Gamma and Deltacoronaviruses, each containing pathogens of 
veterinary or human importance. A current evolutionary model 
postulates that bats are the ancestral source of Alpha- and 
Betacoronaviruses and birds the source of Gamma- and 
Deltacoronaviruses (Woo et al., 2012). The Alphacoronaviruses infect 
swine, cats, dogs and humans. Betacoronaviruses infect diverse 
mammalian species including bats, humans, rodents and ungu¬ 
lates. The SARS coronavirus (SARS-CoV), which verged on a pan¬ 
demic in 2003 with 8273 cases in humans and 755 deaths is a 
Betacoronavirus. Another member of this genus, the recently-dis¬ 
covered Middle East Respiratory Syndrome (MERS) coronavirus 
(MERS-CoV) has claimed 88 human lives from 212 cases since 
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April 2012, and dromedary camels are the suspected reservoir 
(Briese et al., 2014). Genus Gammacoronavirinae includes strains 
infecting birds and whales (Woo et al., 2012; McBride et al., 
2014; Borucki et al., 2013) and deltacoronaviruses have been 
described in birds, swine and cats (Woo et al., 2012). The diversity 
of hosts and genomic features amongst CoVs have been attributed 
to their unique mechanism of viral recombination, a high fre¬ 
quency of recombination, and an inherently high mutation rate 
(Lai and Cavanagh, 1997). 

Infectious bronchitis virus (IBV) is a gammacoronavirus which 
causes a highly contagious respiratory disease of economic impor¬ 
tance in chickens (Cook et al., 2012). IBV primarily replicates in the 
respiratory tract but also, depending on the strain, in epithelial 
cells of the gut, kidney and oviduct. Clinical signs of respiratory 
distress, interstitial nephritis and reduced egg production are com¬ 
mon, and the disease has a global distribution (Cavanagh, 2007; 
Cook et al., 2012). The IBV genome encodes at least ten open read¬ 
ing frames (ORFs) organised as follows: 5' UTR-la-lab-S-3a-3b-E- 
M-5a-5b-N-3a-3'UTR. Six mRNAs (mRNA 1-6) are associated with 
production of progeny virus. Four structural proteins including the 
spike glycoprotein (S), small membrane protein (E), membrane 
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glycoprotein (M), and nucleocapsid protein (N) are encoded by 
mRNAs 2, 3, 4 and 6, respectively (Casais et al., 2005; Hodgson 
et al M 2006). Messenger RNA (mRNA) 1 consists of ORFla and 
ORFlb, encoding two large polyproteins via a ribosomal frameshift 
mechanism (Inglis et al., 1990). During or after synthesis, these 
polyproteins are cleaved into 15 non-structural proteins (nsp2- 
16) which are associated with RNA replication and transcription. 
The S glycoprotein is post-translationally cleaved at a protease 
cleavage recognition motif into the amino-terminal SI subunit 
(92 kDa) and the carboxyl-terminal S2 subunit (84 kDa) by the host 
serine protease furin (de Haan et al., 2004). The multimeric S 
glycoprotein extends from the viral membrane, and the globular 
SI subunit is anchored to the viral membrane by the S2 subunit 
via non-covalent bonds. Proteins 3a and 3b, and 5a and 5b are 
encoded by mRNA 3 and mRNA 5, respectively and are not essen¬ 
tial to viral replication (Casais et al., 2005; Hodgson et al., 2006). 

A confounding feature of IBV infection is the lack of correlation 
between antibodies and protection, and discrepancies between 
in vitro strain differentiation by virus neutralization (VN) tests 
and in vivo cross-protection results. Taken with the ability for high 
viral shedding in the presence of high titres of circulating antibod¬ 
ies, the involvement of other immune mechanisms are evident, and 
the roles of cell-mediated immunity and interferon have been 
experimentally demonstrated (Timms et al., 1980; Collisson 
et al., 2000; Pei et al., 2001; Cook et al., 2012). 

Dozens of IBV serotypes that are poorly cross-protective have 
been discovered and studied by VN tests and molecular character¬ 
isation of the S protein gene. Most of these serotypes differ from 
each other by 20-25% at amino acid level in SI, but may differ 
by up to 50%. SI contains the epitopes involved in the induction 
of neutralizing, serotype-specific and hemagglutinaton inhibiting 
antibodies (Cavanagh, 2007; Darbyshire et al., 1979; Farsang 
et al., 2002; Ignjatovic and McWaters, 1991; Meulemans et al., 
2001; Gelb et al., 1997). Most of the strain differences in SI occur 
in three hypervariable regions (HVRs) located between the amino 
acid residues 56-69 (HVR1), 117-131 (HVR2) and 274-387 
(HVR3) (Moore et al., 1997; Wang and Huang, 2000). Monoclonal 
antibody analysis mapped the locations of many of the amino acids 
involved in the formation of VN epitopes to within the first and 
third quarters of the linear SI polypeptide (De Wit, 2000; Kant 
et al., 1992; Koch et al., 1990), which is where closely-related 
stains (>95% amino acid identity) also differ (Bijlenga et al., 2004; 
Farsang et al., 2002). Cavanagh (2007) proposed that these parts 
of the SI subunit are very tolerant of amino acid changes, confer¬ 
ring a selective advantage. Recently, the receptor-binding domain 
of the IBV M41 strain was mapped to residues 16-69 of the N ter¬ 
minus of SI, which overlaps with HVR1 (Promkuntod et al., 2014). 

The S2 subunit, which drives virus-cell fusion, is more con¬ 
served between serotypes than SI, varying by only 10-15% at the 
amino acid level (Bosch et al., 2005; Cavanagh, 2005). Although it 
was initially thought that S2 played little or no role in the induc¬ 
tion of a host immune response, it has since been shown that an 
immunodominant region located in the N-terminal half of the S2 
subunit can induce neutralizing, but not serotype-specific, 
antibodies demonstrated by the ability of this subunit to confer 
broad protection against challenge with an unrelated serotype 
(Kusters et al., 1989; Toro et al., 2014). 

IBVs are continuously evolving as a result of (a) frequent point 
mutations and (b) genomic recombination events (Cavanagh et al., 
1992; Kottier et al., 1995; Jackwood et al., 2005; Zhao et al., 2013; 
Kuo et al., 2013; Liu et al., 2014). Multiple studies on IBV diversity 
have focused on inter-serotypic and inter-strain variation, and a 
few have focused on sub-populations within the SI subunit in vac¬ 
cine strains (Gallardo et al., 2012; Ndegwa et al., 2014). The present 
study aimed to explore genetic variation across the entire viral 
genome at the sub-consensus level. It was anticipated, based on 


the published literature, that certain regions, and the SI subunit 
HVRs in particular, would display significant sub-genomic varia¬ 
tion. This study focused on a QX-like strain, a serotype currently 
causing significant poultry health problems across Europe, Asia, 
South America and South Africa. 

2. Materials and methods 

2.2. Origin and isolation of QX-like strain ck/ZA/3665/1 2 

Twenty-eight-day old chickens in a commercial broiler opera¬ 
tion presented with acute lethargy, reduced feed consumption 
and mortality. Tracheitis and swollen kidneys were noted on post 
mortem, as well as a secondary Escherichia coli infection. The worst 
affected houses had mortality rates of 19.8%, 11.9% and 10.2%. IBV 
was isolated in specific pathogen free (SPF) embryonated chicken 
eggs (ECE) as described in Knoetze et al. (2014). After an initial 
two passages in ECE, the virus was passaged twice further at the 
University of Pretoria. 

2.2. Preparation of the genome and Illumina sequencing 

RNA was extracted from allantoic fluid using TRIzol® reagent 
(Ambion, Life Technologies, Carlsbad, USA) according to the manu¬ 
facturer’s protocol. The genome was transcribed to cDNA and 
amplified using a TransPlex® Whole Transcriptome Amplification 
kit (Sigma-Aldrich, Steinheim, Germany). Illumina MiSeq sequenc¬ 
ing on the cDNA library was performed at the ARC-Biotechnology 
Platform, Onderstepoort, Pretoria. 

2.3. Genome assembly, RNA folding and recombination analysis 

Illumina results were analysed using the CLC Genomics 
Workbench v 5.1.5. Paired-end reads were trimmed and a prelimi¬ 
nary de novo assembly was performed. The larger segments were 
analysed by BLAST to identify the closest genomic reference strain 
(ITA/90254/2005, CAZ86699). This strain was retrieved and used as 
a scaffold for assembly-to-reference, generating a consensus 
sequence for 3665/11. Trimmed paired-end reads were also 
mapped against other IBV serotype genomes, subsequently con¬ 
firming that strain 3665/11 was a pure culture of a QX-like IBV. 
The genome was deposited in Genbank under the accession num¬ 
ber KP662631. RNA folding was predicted using the CLC Genomics 
Workbench v 5.1.5. Genetic recombination in the consensus 
sequence was evaluated using the recombination detection pro¬ 
gram RDP v4.31. 

2.4. Genome annotation and single nucleotide polymorphism (SNP) 
analysis 

Coding sequence and ORF prediction was carried out in VIGOR 
(Wang et al., 2010). Trimmed paired-end reads were re-mapped 
against the 3665/11 consensus sequence for SNP detection. A 
SNP detection table generated in the CLC Genomics Workbench 
was manually edited to eliminate all SNPs with a frequency of 
<5%. This conservative cutoff was selected to eliminate any non¬ 
specific PCR errors introduced during preparation of the transcrip¬ 
tome library or deep sequencing, and excluded most of the point 
insertions producing gaps and frameshift mutations across the 
genome. Nucleotide substitutions in coding regions were manually 
inspected for changes to the consensus amino acid (Table 1, 
Supplementary data). Motifs were predicted using the ELM 
Eukaryotic Linear Motif Resource for Functional Sites in Proteins 
(Dinkel et al., 2014). 
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2.5. Protein secondary structure prediction 

Protein structures for SI and S2 were predicted in RaptorX, a 
structure prediction server that predicts three dimensional (3D) 
structures for protein sequences without close homologs in the 
Protein Data Bank (PDB) (Kallberg et al., 2012). SI and S2 3D struc¬ 
tures were annotated and superposed in CCP4MG v2.9.0 using the 
Secondary Structure (SSM) superposition method. This method 
superimposes pairs of structures by: (1) finding the secondary 
structure elements (SSEs) and representing them as one simple 
vector spanning the length of the SSE; (2) finding equivalent SSEs 
in the two structures using graph-theory matching by geometric 
criteria of distances and angles between the vectors; (3) 
superimposing vectors representing equivalent SSEs; (4) finding 
the most likely equivalent residues in the superposed SSEs; (5) 
superimposing CA atoms of equivalent residues; and (6) iterating 
the last two steps. 

3. Results and discussion 

3.1. Genome assembly and annotation 

The genome sequence of QX-like strain ck/ZA/3665/11 was 
assembled from 74, 578 IBV-specific paired-end reads of 144 bp 
each. The genome was 27, 388 nt in length with the 5' UTR incom¬ 
plete by ~139 nt. Thirteen ORFS were predicted by VIGOR in the 
order 5'-UTR-l a-1 ab-S-3a-3b-E-M-4b-4c-5a-5b-N-6b-3'UTR 

(Fig. 1). This genome organisation including 4b, 4c and 6b was 
similar to that of turkey coronavirus (TCoV; Cao et al., 2008), and 
the ORFs 4b, 4c and 6b were also predicted in Australian IBV strains 
(Hewson et al., 2011). When the sequences for a QX-like sequence 
(JQ088078) and ArkDPI (EU418976) were analysed using VIGOR, a 
similar genome arrangement was detected. Mass41 (AY851295) 
did not however contain the predicted 4b, 4c and 6b ORFs (data 
not shown). 

ORF 4b was 94 amino acids (aa) in length and no SMART 
domains were predicted, whereas ORF 4c was 56 aa in length 
and a low complexity region was identified. The 6b ORF encoded 
a 74 aa protein with a signal peptide predicted from residues 1 
to 24 and two transmembrane domains from residues 2 to 25 
and 35 to 57. No recombination was detected across the genome 
of QX-like strain ck/ZA/3665/11. 

A gap was present between nucleotides 1666 and 1667 ( fable 1, 
Supplementary data) (~aa 370 in the la ORF). Although the gap 
was present in the majority (74.6%) of reads, the sequence for 
strain 3665/11 deposited in Genbank contains the minority ade¬ 
nine residue because the gap introduced a frame shift, splitting 
ORF la into two. It may be a legitimate mutation, but until further 
transcriptional analyses are conducted, the ORF la gene has been 
reported intact here. 

3.2. Comparative frequency of mutations in ORFs 

Two hundred and eight SNPs across the IBV QX-like genome 
were evaluated at the selected cut-off value. In Table 1 the consen¬ 
sus reference is juxtaposed with the allele variations, the relative 
frequencies of these point mutations, the actual number of counts 


and coverage at that position, the corresponding ORF or region and 
the mutational effect. Coverage ranged from 4-fold (position 
11,540) up to 4587 fold (position 26,637). 

The relative frequencies of missense: silent SNPs in relation to 
ORF length were calculated in order to obtain a comparative mea¬ 
sure of variability in specific genes ( "able 2). Results for the struc¬ 
tural genes and polymerase are illustrated in Fig. 2, and the results 
for the non-structural protein ORFs and non-coding regions, which 
were much shorter in length, are presented in Fig. 3. Overall the 
most variable ORFs in terms of total SNPs, in descending order, 
were: E, 3b, 5' UTR, N, la, S, lab, M, 4c, 5a, 6b (no SNPs were 
detected at the 5% cut-off in the 3a and 3' UTR regions). The most 
variable, as assessed by SNPs leading to missense mutations, in 
descending order, were: 3b, E, 5' UTR, la, N, M, 5a, lab, S, 4b, 3a/ 
4c/6b. These mutations presumably did not affect the tertiary pro¬ 
tein structure and might be advantageous to the virus. The ORFs 
under the strongest positive selection pressure as indicated by 
the proportion of synonymous mutations, were, in descending 
order, 4c, lab, N, S, E, 3a/3b/M/4b/5a/6b. 

3.2.1. E protein 

The E protein ORF had significantly more missense mutations 
on average, at a frequency of 1.5% of the ORF, which is more than 
threefold higher than the average value (0.55) for the la, lab, S, 
M and N genes. The E protein gene was the most variable at the 
sub-consensus level, with 5 missense mutations and only one 
silent mutation across its 333 bp ORF. Despite its small size, the 
CoV E protein drastically influences the replication of CoVs and 
their pathogenicity. In the SARS-CoV, it was experimentally 
demonstrated that the E protein is not essential for genome repli¬ 
cation or subgenomic mRNA synthesis, but it does affect morpho¬ 
genesis, budding, assembly, intracellular trafficking and virulence. 
In fact, in SARS-CoV the E protein is the main antagonist associated 
with induction of inflammation in the lung, which causes the acute 
respiratory distress syndrome from which the virus derives its 
name (DeDiego et al., 2014). No studies have been published for 
the IBV E protein, but the high variability demonstrated here sug¬ 
gests that it may be an important virulence factor in poultry, and 
that a higher mutation rate possibly provides an evolutionary 
advantage in overcoming host cellular immune responses. 

3.2.2. N protein 

Although the N protein gene contained one of the highest over¬ 
all frequencies of SNPs (1.06%), the N gene is evidently under 
greater selective pressure, since 38.9% of these mutations (0.41% 
as a total of the gene) were silent. The coronavirus N protein is 
multifunctional, playing vital roles in viral assembly and formation 
of the complete virion and is required for optimal viral replication. 
Additionally, the CoV N protein is implicated in cell cycle reg¬ 
ulation and host translational shutoff, displays chaperone activity, 
activates host signal transduction and aids viral pathogenesis 
through the antagonism of interferon induction (reviewed by 
McBride et al., 2014). Given its fundamental roles in RNA binding, 
formation of the ribonucleoprotein complex and in the virion, it is 
not surprising that this structural protein is the most conserved, as 
evidenced by its gene having the highest ratio of silent mutations 
of all the genes analysed. The importance of maintaining the 
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Fig. 1. Genome organisation of QX-like IBV strain 3665/11. 
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Table 2 

Relative frequencies of SNPs in QX-like IBV strain 3665/11. 


Region/ORF 

Length in nt (% of genome) 

SNP count frequency 



Total SNPs (as a% of ORF) 

# mutations (as a% of ORF) 

# of silent mutations (as a% of ORF) (as a% of total SNPs) 

5'UTR 

357 (1.30) 

4(1.12) 

4(1.12) 

— 

lab 

19,872 (72.55) 

152 1.47) 

112 (1.08) 

40 (0.39) [52.96] 

S 

3438 (12.55) 

23 (0.67) 

15 (0.43) 

8 (0.23) [34.78] 

3a 

174 (0.64) 

0 

0 

0 

3b 

192 (0.70) 

3 (1.56) 

3 (1.56) 

0 

E 

333 (1.22) 

6 (1.80) 

5 (1.50) 

1 (0.30) [16.67] 

M 

681 (2.49) 

4 (0.59) 

4 (0.59) 

0 

4b 

285 (1.04) 

1 (0.35) 

1 (0.35) 

0 

4c 

171 (0.62) 

1 (0.58) 

0 

1 (0.58) [100] 

5a 

198 (0.72) 

1 (0.51) 

1 (0.51) 

0 

N 

1230 (4.49) 

13 (1.06) 

8 (0.65) 

5 (0.41) [38.46] 

6b 

222 (0.81) 

1 (0.45) 

1 (0.45) 

0 

3'UTR 

272 (0.99) 

0 

0 

0 

Total genome 

27,388 

208 (0.76) 


Total SNPs evaluated 



la lab Spike E M N 


■ Missense ■ Silent 

Fig. 2. Relative frequencies of mutations in structural and polymerase ORFs of QX- 
like IBV strain 3665/11. 

sequence integrity in the N protein in IBV was demonstrated by 
Kuo et al. (2013), who reported that two residues within the N- 
terminal domain of a Taiwanese IBV strain were positively 
selected, and that mutation of either of these significantly reduced 
the affinity of the N protein for the viral transcriptional regulatory 
sequence. 

3.2.3. M protein 

The glycosylated amino terminus of the M protein lies on the 
outside of the virion and M spans the membrane structure three 
times (Collisson et al., 2000). All four SNPS in the M gene resulted 
in missense mutations, two of which were located in the predicted 
transmembrane region. The M protein plays an important role in 
CoV virion formation. IBV M protein co-expressed with S assem¬ 
bled into virus-like particles (Liu et al., 2013) confirming its major 
role in virion formation, but CoV M proteins also interact with 
other proteins and perform other roles in the infected cell. For 
example, M together with the accessory proteins 4a, 4b and 5 were 
all found to prevent the synthesis of IFN-(3 through the inhibition of 
interferon promotor activation and IRF-3 function, thus influencing 
disease outcome (Yang et al., 2013). 

3.2.4. Accessory proteins 

Coronavirus accessory proteins are generally dispensable for 
virus replication, but they play vital roles in virulence and patho¬ 
genesis by affecting host innate immune responses, encoding 



5'UTR 3a 3b 4b 4c 5a 6b 3'UTR 


■ Missense ■ Silent 

Fig. 3. Relative frequencies of mutations in non-structural ORFs and non-coding 
regions of QX-like IBV strain 3665/11. 


pro- or anti-apoptotic activities, or by effecting other signalling 
pathways that influence disease outcomes (Susan & Julian, 2011). 
IBV was demonstrated to induce a considerable activation of the 
type I IFN response, but it was delayed with respect to the peak 
of viral replication and accumulation of viral dsRNA (Kint et al., 
2014). IBV accessory proteins 3a and 3b play a role in the mod¬ 
ulation of this delayed IFN response, by regulating interferon pro¬ 
duction at both the transcriptional and translational levels. 
Interestingly, IBV proteins 3a and 3b seem to have opposing effects 
on IFN production in infected cells: 3a seems to promote IFN pro¬ 
duction, and 3b is involved in limiting IFN production, antagonis¬ 
ing each other to tightly regulate IFN production (Kint et al., 
2014). Field isolates lacking 3a and 3b displayed reduced virulence 
in vitro and in vivo (Mardani et al., 2008). ORF 3a in strain 3665/11 
lacked SNPs, but ORF 3b in had the highest frequency of SNPs rela¬ 
tive to its size (n = 3; 1.56%). 

ORF 4b is present in many international IBV strains (Flewson 
et al., 2011; Bentley et al., 2013) but is rarely mentioned in the 
literature since a canonical transcription regulatory sequence 
(TRS-B) could not be identified upstream of the encoding RNA. 
However, Bentley et al. (2013) demonstrated that IBV was capable 
of producing subgenomic mRNAs from noncanonical TRS-Bs via a 
template-switching mechanism with TRS-L, the conserved TRS in 
the leader sequence in the 5' UTR, which may expand the 
Gammacoronavirus repertoire of proteins. They specifically demon¬ 
strated the transcription of the 4b ORF by this mechanism. 
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Fig. 4. Effects of mutations on the predicted RNA structures in the 5' UTR of QX-like IBV strain 3665/11 
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Fig. 5. Schematic representation of the Spike protein of QX-like IBV strain 3665/11. Missense mutations are indicated as solid circles, and silent mutations as empty circles. 


Although no studies have been performed determining ORF 4b's 
functional role in the pathogenesis of IBV, the homolog in MERs- 
CoV is a potent interferon antagonist (Yang et al., 2013). A single 
SNP causing a missense mutation was present in 11.3% of the 
sub-consensus population of the 4b ORF in this study. The single 
mutation in ORF 4c was silent, and the predicted protein contained 
a low complexity region. Low complexity regions are regions of 
protein sequences with biased amino acid composition, and may 
be involved in flexible binding associated with specific functions 
(Coletta et al., 2010). 

ORF 6b, a 73 aa protein with a signal peptide and two trans¬ 
membrane domains, was identified in the genome of strain 3665/ 
11, and ORF 6b was also reported in TCoV and Australian IBV 
strains (Cao et al., 2008; Hewson et al., 2011). The homolog in 
SARS-CoV is 63 aa in length and was identified as an endoplasmic 
reticulum/Golgi membrane-localised protein that induces apopto¬ 
sis. Apoptosis may play an important role in promoting CoV dis¬ 
semination in vivo , minimising inflammation and aiding evasion 
of the host’s defence mechanisms (Ye et al., 2010). Protein 6 from 
SARS-CoV accelerated the replication of murine CoV, increasing the 
virulence of the original attenuated virus ( Tangudu et al., 2007). 
Presumably, this accessory protein plays a similar role in IBV 
pathogenesis, although this remains to be determined 
experimentally. 

3.2.5. Untranslated regions (UTRs) 

The 3' UTRs of CoV genomes contain conserved cis -acting 
sequence and structural elements that play essential roles in RNA 
synthesis, gene expression and virion assembly, and each sub-ge- 
nomic RNA contains a 5' leader segment that is identical to this 
3' UTR region of the genome (Goebel et al., 2004; Sola et al., 
2011). No SNPs were detected in the 3' UTR in the sub-consensus 
sequences of strain 3665/11, which is consistent with the vital reg¬ 
ulatory role that this region plays. Conversely, the partial 5' UTR 
sequence of strain 3665/11 was highly variable. 

The un-sequenced 139 nucleotides from the 5' end of the gen¬ 
ome were extrapolated from the most similar genomic sequence, 
that strain ITA/90254/2005, and the secondary RNA structure of 
the 5' UTR for 3665/11 was predictively folded (Fig. 5). The SNPs 
were then systematically substituted into the consensus sequence 
and RNA folding repeated. Delta G values for the predicted RNA 
secondary structures in Fig. 4(a)-(h) varied from -182.9 kcal/mol 
to -185.5 kcal/mol. Apart from the 148 C to G mutation (Fig. 4(c)), 
effects on RNA secondary structure were minor and the structures 
in Fig. 4(b) and (d)-(h) were similar. To assess the effect of combin¬ 
ing mutations, an RNA containing 148 T(U), 164 T(U), 169 G and 170 C 
was folded, and this resulted in a similar stem-loop structure to 
those in Fig. 4(a), (b) and (d)-(h) (data not shown). Apart from 


S2 668 lie 


\ 



Fig. 6. Predicted structure of the Spike protein monomer of QX-like IBV strain 
3665/11. Missense mutations in SI (blue) and S2 (yellow) are indicated as coloured 
side chains. 


the mutation 148 C to G, the SNPs had little effect on the secondary 
RNA structure in the 5' UTR. 


3.3. The spike protein 

Twenty-three SNPS were identified in the 3438 bp spike protein 
ORF; 15 of these resulted in missense mutations at the amino acid 
level, and 8 were silent mutations. The frequency of total SNPs in 
the S protein ORF was below average, at 0.67%, compared to the 
genome average of 0.76%. It was anticipated that the majority of 
mutations in the S ORF would be in the SI gene, particularly in 
the HVRs, but, surprisingly, this was not the case. Only three of 
these SNPS (two missense and one silent) were found in the SI 
gene, and all three were located in the COOH-terminal half of the 
SI protein (Fig. 5). Only one mutation, a missense mutation, 
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Fig. 7. The predicted locations of HVR1 (7a), HVR2 (7b) and HVR3 (7c) of QX-like IBV strain 3665/11, indicated as coloured side-chains on the SI subunit. 


mapped to HVR3. No SNPS were detected in HVR1 or HVR2. The S2 
subunit was considerably more variable, containing 87% of the 
polymorphisms detected across the entire S protein. 

Two other notable features of S2 were detected: the first was a 
multi-A insertion site located between nucleotides 22,794 and 
22,795 in the genome. The polymorphism involved the insertion 
of either one or two adenine nucleotides, possibly via a mechanism 
of polymerase stuttering. The second region of interest was located 
in close proximity, just downstream of the multi-A insertion site: a 
stretch of three consecutive mutated amino acids, namely 
889 C W, 890 G -► D, 891 S -► C followed by silent mutation 892 G 
(Fig. 5). 

Template-based protein structure modelling was used to pre¬ 
dict the secondary structure of the IBV spike monomer, based on 
the available crystal structure for the MERS-CoV SI and S2 sub¬ 
units (Fig. 6). SI and S2 were modelled separately in Raptor X 
and then superposed. The IBV SI structure was arranged as two 
beta barrels and S2 formed packed oe-helices. The S2 protein was 
not complete and the transmembrane domain was not represented 
since there were no sufficiently similar structures on which to 
model this region, but this is the first model of the spike protein 
monomer for IBV. FIVR1 and the putative receptor binding domain 
maps to the apical beta barrel (Fig. 7(a)) and HVR2 is located on the 
flat plane on the base of the apical beta barrel and the peptide con¬ 
necting it to the basal beta barrel (Fig. 7(b)). FIVR3 maps to a region 
in the basal beta barrel of SI that was predicted to contact or 


interact with S2 (Fig. 7(c)). The locations of the missense mutations 
detected by SNP analysis in the SI and S2 subunits are indicated in 
Fig. 6. Many of these SNPs mapped to codons encoding amino acids 
on the surface of the predicted structure, but two regions were 
notable. Firstly, the highly variable region in S2 spanning amino 
acids 889-901 was exposed on the S2 stalk, although folding of 
the remainder of the COOH domain may have influenced this con¬ 
formation. Secondly, 668 Ile was exposed on a projection at the top 
of the monomer. This residue precedes the second furin cleavage 

site in the S2 subunit with the sequence 667 PIS SSGR/S 674 . The 
cleavage of the S1/S2 furin motif ( 517 RRRR/S 521 in strain 3665/11) 
was found to be non-essential for attachment of IBV to the cell. 
Rather, it promotes infectivity within the cell. In studies with the 
Beaudette IBV strain, the second furin cleavage site in the S2 sub¬ 
unit was required for furin-dependent entry and syncytium forma¬ 
tion, and the current hypothesis is that interplay between the SI 
and S2 subunits determines virus attachment to specific receptors, 
determining tissue tropism of the virus (Promkuntod et al., 2013). 
The exact biological roles of these areas in S2 that are prone to 
mutation remain to be experimentally determined. 


4. Conclusions 

Archaeological remains of domestic chickens in Northeast China 
and the Indus Valley date back ~8000 years (West & Zhou, 1988). 
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The CoV group has been estimated to have arisen around 8100 BC, 
and the Gammacoronaviruses diverged from the CoV group around 
2800 BC (Woo et al M 2012). CoVs have probably been co-evolving 
with their gallinaceous hosts for several thousand years. Indeed, 
Cook and co-authors (2012) state that “IBV is found everywhere 
that commercial chickens are kept”. The implication is that 
although IBV was only discovered some 80 years ago, the variety 
of serotypes we now observe are the results of hundreds if not 
thousands of years of genetic drift and recombination, accelerated 
by modern poultry farming practises where chickens are kept in 
high densities, and inter-regional trade in poultry and other avian 
species. 

Studies on antigenic diversity of IBVs are heavily biased 
towards studies of the SI gene, and the HVRs in particular 

(Cavanagh, 2007; Ducatez et al., 2009; Kant et al., 1992; Mork 
et al., 2014). Many of these studies cite frequent point mutations 
in the SI gene, but this was not the finding of the present study. 
The discovery of a novel 3'-to-5' exoribonuclease activity in CoV 
nspl4, which regulates replication fidelity and diversity in coron- 
aviruses (Denison et al., 2011), lends weight to the theory that 
genetic drift is not primarily responsible for the degree of variation 
and serotypes we observe in poultry nowadays. 

Instead, generation of variation by recombination is likely the 
main mechanism of serotypic diversity. The high frequency of 
RNA recombination in coronaviruses is likely caused by their 
unique mechanism of RNA synthesis, which involves discontinuous 
transcription and polymerase jumping (Jeong et al., 1996). 
Sequencing of many field strains has provided convincing evidence 
that many, possibly all, IBV strains are recombinants between dif¬ 
ferent field strains (Cavanagh, 2007; Kuo et al., 2013; Liu et al., 
2013; Hewson et al., 2011), driving IBV evolution at a population 
level. Recombination of distinct IBV strains has been 
experimentally demonstrated in vitro, in ovo and in vivo (Kottier 
et al., 1995; Wang et al., 1997). 

The SI subunit HVR1 contains the IBV receptor-binding site. 
Therefore despite the sequence variability in this region (which 
includes insertions and deletions), diverse strains must retain this 
critical biological function. All three HVRs may represent ancient 
artefacts of recombination, which have been perpetuated because 
they retain receptor-binding properties, with minimal permissive 
amino acid changes. This theory contrasts the tenet that the 
HVRs in the SI subunit are very tolerant of amino acid changes 
produced by genetic drift, thereby conferring a selective advantage 
(Cavanagh, 2007; De Wit, 2000; Kant et al., 1992; Koch et al., 1990). 

Whereas SI fulfils a primary role in receptor binding 
(Promkuntod et al., 2014), a broader role of S2 in antigenicity 
and attachment to receptors is emerging. Chickens primed with a 
recombinant-expressed S2 subunit of a virulent ArkDPI strain 
and boosted with a live Mass-type vaccine were protected against 
challenge with live virulent ArkDPI virus (Foro et al., 2014). 
Although S2 subunits most likely do not contain an additional 
independent receptor-binding site, S2 in association with SI forms 
part of a specific ectodomain which is critical to the binding of the 
virus to chicken tissues, which implies that both SI and S2 contain 
determinants important to viral host range (Promkuntod et al., 
2013). The results of the present study demonstrate that S2 is more 
predisposed to mutations than SI, providing an adaptive advan¬ 
tage and at least one other study has reported higher variability 
in S2 compared to SI (Mo et al., 2012). 

IBV has not been as extensively studied as other CoVs, and little 
progress has been made in effectively controlling or eradicating the 
disease in poultry. Experimental and field studies provide substan¬ 
tial evidence that use of a homologous IBV vaccine is best, but 
sometimes, intriguingly, protection can be offered by an unrelated 
vaccine, or by the use of two heterologous vaccines (Jones, 2010). 
Genotyping and phylogenetic analysis of IBV are typically focused 


on the SI subunit sequence, and Liu et al. (2014) caution against 
drawing conclusions based on a single gene sequence, particularly 
a partial gene sequence. The roles of the IBV E and accessory pro¬ 
teins and their roles in the pathogenesis of IBV have been com¬ 
pletely overlooked, even when the roles of the homologs in other 
CoVs have been proven significant. Accessory proteins of IBV and 
other CoVs may also offer a new generation of vaccine targets: 
the use of codon-deoptimization of non-structural virulence genes 
in influenza A virus and respiratory syncytial virus resulted in 
genetically stable viruses that retained immunogenicity but were 
attenuated (Nogales et al., 2014; Meng et al., 2014). Evidently viru¬ 
lence and immunogenicity in IBV is a multi-genic trait, and future 
studies must aim to pursue a better understanding and exploita¬ 
tion of the roles of various viral proteins in the host, if any 
advances are to be made in controlling the disease in poultry. 
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